Goto

Collaborating Authors

 preprint arxiv


Efficient Diffusion Models under Nonconvex Equality and Inequality constraints via Landing

Jeon, Kijung, Muehlebach, Michael, Tao, Molei

arXiv.org Machine Learning

Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex feasible sets $Σ$ that simultaneously enforces equality and inequality constraints throughout the diffusion process. Our framework incorporates both overdamped and underdamped dynamics for forward and backward sampling. A key algorithmic innovation is a computationally efficient landing mechanism that replaces costly and often ill-defined projections onto $Σ$, ensuring feasibility without iterative Newton solves or projection failures. By leveraging underdamped dynamics, we accelerate mixing toward the prior distribution, effectively alleviating the high simulation costs typically associated with constrained diffusion. Empirically, this approach reduces function evaluations and memory usage during both training and inference while preserving sample quality. On benchmarks featuring equality and mixed constraints, our method achieves comparable sample quality to state-of-the-art baselines while significantly reducing computational cost, providing a practical and scalable solution for diffusion on nonconvex feasible sets.


You Don't Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning

Neural Information Processing Systems

All instantiations of this paradigm were trained using strong and well-established hand-crafted data augmentations, leading to the general belief that they are required for the proper training and performance of such models.



Supplementary Materials for NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning

Neural Information Processing Systems

Right: Normalized attention scores processed by two different normalization methods. Table 1: Performance of searched architectures using different NAS algorithms in DARTS [ 7 ] space on CIFAR-10 [ 5 ]. The inference latency was measured on a machine with GeForce RTX 3090 GPU. The batch size was set to 1. Encode(ms) Infer(ms) Total(ms) NAR-Former 2.4784 17.4864 19.9648 NAR-Former V2 2.3722 5.2276 7.5998 may be somewhat different. Due to the softmax, Eq. ( 5) focuses almost all attention on the current The Eq. ( 2) restricts attention to connected nodes by introducing the adjacency matrix.